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L’UMMARV 


Iri  tnls  paper  we  wisn  to  sho*  tnat  tne  fundamental  problem 
of  determining  the  utility  of  a  communi cat  ion  cnannel  In  conveying 
Information  can  be  Interpreted  as  a  problem  wltnln  tne  framework 
of  mult  1— stage  decision  processes  of  stochastic  type,  and  as  such 
may  be  treated  by  means  of  the  theory  of  dynamic  programming. 


We  snail  begin  by  formulating  some  aspects  of  the  general 
problem  In  terms  of  multi-stage  decision  processes,  with  brief 
descriptions  of  stocnastlc  allocation  processes  and  learning 
processes.  Following  this,  as  a  simple  example  of  the  applica¬ 
bility  of  tne  techniques  of  dynamic  programming,  we  snail  discuss 
in  detail  a  problem  posed  recently  by  Kelly.  In  tnis  paper,  It 
Is  snown  by  Kelly  that  under  certain  conditions,  the  rate  of  trans¬ 
mission,  as  defined  by  Shannon  can  be  obtained  from  a  certain 
multi-stage  decision  process  with  an  economic  criterion.  Here  we 
shall  complete  Kelly's  analysis  In  some  essential  points,  using 
functional  equation  techniques  and  considerably  extend  his  results. 
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ON  THE  ROLE  OF  DYNAMIC  PROGRAMMING  IN  STATISTICAL 

COMMUNICATION  THEORY 

1 .  Introduction . 

In  this  paper  we  wish  to  snow  that  the  fundamental  problem 
of  determining  the  utility  of  a  communication  channel  In  conveying 
information  can  be  interpreted  as  a  problem  within  the  framework 
of  multi-stage  decision  processes  of  stochastic  type,  and  as  such 
may  be  treated  by  means  of  the  theory  of  dynamic  programming, 

M.  [2]  ,  M 

This  paper  Is  to  be  envisaged  as  a  step  In  the  direction  of 
a  broad  theory  of  communication,  as  contemplated  by  N.  Wiener  in 
his  recent  article,  [12]  ,  and  following  the  pioneering  efforts 

of  Rice,  [dj  ,  and  Shannon,  •  Among  other  steps 

along  this  path,  we  would  like  to  cite  the  recent  articles  of 

m  « 

Busgang  and  Middleton,  ,  and  Middleton  and  Van  Meter, 

,  wnlcn  employ  the  modern  theory  of  statistical  declalon 
functions  and  sequential  analysis,  due  to  Wald,  [ll] . 

We  shall  begin  by  formulating  some  aspects  of  tne  general 
problem  In  terms  of  multi-stage  decision  processes,  with  brief 
descriptions  of  stochastic  allocation  processes  ana  learning  pro¬ 
cesses.  Following  this,  as  a  simple  example  of  the  applicability 
of  the  techniques  of  dynamic  programming,  we  shall  discuss  In  detail 
a  problem  posed  recently  by  Kelly,  “6]  .  In  this  paper,  It 

le  snown  by  Kelly  that  under  certain  conditions,  the  rate  of  trans¬ 
mission,  as  defined  by  Shannon,  t.lo]  >  can  obtained  from  a 
certain  multi-stage  decision  process  with  an  economic  criterion. 

Here  we  shall  complete  Kelly's  analysis  in  some  essential  points, 
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uslr.g  functional  equation  techniques  and  considerably  extend 
his  results. 

We  shall  consider,  in  addition  to  the  original  problem  of 
Kelly,  a  time-dependent  case,  a  process  involving  correlated 
signals,  and  a  multi-signal  case,  in  both  discrete  and  continuous 
verslone.  It  will  be  seen  that  the  logarithmic  criterion  func¬ 
tion  plays  an  extremely  important  role,  since  its  apeclal  func¬ 
tional  properties  permit  us  to  obtain  explicit  representations 
for  both  maximum  return  and  optimal  policy. 

Finally  we  discuss  briefly  a  functional  equation  arising 
from  the  general  question  of  defining  the  "value”  of  a  communi¬ 
cation  channel  In  a  fashion  which  Is  Independent  of  Its  use. 

2 .  The  Underlying  Model . 

Let  us  begin  by  constructing  a  simple  model  of  one  aspect 
of  the  general  communication  problem.  It  will  be  reasonably 
clear  from  what  fellows  how  more  intricate  models  may  be  construct¬ 
ed  to  take  account  of  more  complicated  systems. 

Consider  a  source  S  which  produces  at  discrete  times*  a 
sequence  of  pure  signals,  together  with  noise,  which  may  be  of 
either  stochastic  or  deterministic  type,  depending  upon  our  further 
assumptions  concerning  the  structure  of  the  system.  The  combined 
signal  is  fed  Into  a  black  box  which  we  call  a  "communication 
channel",  which,  In  turn,  emlte  a  signal.  This  output  signal 
Is  observed. 

On  the  basis  of  the  observation  of  the  output  signal,  It  is 

desired  to  make  various  deductions  concerning  the  properties  of 

#rr’  e  case  of  continous  signal  emission  can  also  be  treated  by  the 
methods  outlined  below,  at  the  expense  of  the  Introduction  of  more 
sophisticated  concepts.  We  prefer  to  keep  the  mathematical  level 
moderate  in  this  first  discussion;  however,  see  §  1}. 


the  original  pure  signal. 
Schematically , 
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Source 


> 


Communication 

Cnannel 


>  Observer 


In  mathematical  terms,  let 


(1)  x  »  the  pure  signal  emanating  from  S 

r  -  the  noise  associated  with  the  signal 
x'  ■  F(x,r),  the  Input  to  the  communication  system 
j  ■  the  signal  transmitted  to  the  observer  by  tne 
communication  channel. 

Let  us  furtner  write 


(?)  y  -  T(x')  -  T(F(x,r)), 

where  T  represents  the  transformation  of  the  Input  signal  x'  due 
to  the  communication  channel. 

Consider  tne  set  of  all  communication  systems,  or,  equiva¬ 
lently,  trie  space  of  all  associated  transformations,  T.  We  wish 
to  Introduce  an  ordering,  or,  what  Is  much  more  satisfactory  when 
possible,  a  metric  which  will  enable  us  to  compare  two  communi¬ 
cation  systems,  and  to  evaluate  their  performance,  see  §12. 

It  must  be  stressed  at  the  very  outset  of  an  investigation 
of  this  type  that  it  snould  be  possible  to  accomplish  this  aim  in 
an  unlimited  number  of  ways,  dependent  upon  the  source,  cnannel, 
nature  of  tne  observer,  and  the  personal  philosophies  involved, 
that  Is  to  say,  upon  tne  utility  scales  employed. 


P-9  ^9 

Revised 

1^-19-96 

I n  the  following  sections,  we  shall  present  two  alternate 
methods  for  evaluating  a  communication  system.  Although  each 
Is  a  particular  case  of  a  more  general  scneme,  which  we  shall 
discuss  subsequently,  It  Is  worthwhile  to  present  them  separately 
first,  as  they  occur  In  Important  applications.  In  this  way,  we 
hope  to  avoid  the  usual  risk  of  obscuring  the  issue  by  extreme 
generality . 

9.  A  Stochastic  Allocation  Process . 

Let  us  assume  that  the  observer  has  a  sum  of  money,  or  re¬ 
sources  of  other  types,  wnlcn  we  denote  by  the  vector  x,  called 
tne  state  vector.  Upon  receiving  a  y-8lgnal,  the  observer  Is 
required  to  make  an  allocation  of  resources  to  various  activities. 
The  effect  of  this  allocation  is  to  change  x  Into  R(x,y),  a 
stocnastic  vector  wnose  distribution  we  shall  assume  here  to  be 
known.  The  case  in  whlcn  the  distribution  Is  not  known  is  closely 
allied  wltn  the  second  model  we  shall  discuss. 

The  process  Is  now  repeated  N  times,  where  N  may  be  fixed, 
the  simplest  case,  or  the  number  of  stages  may  depend  upon  the 
process  Itself  as  a  consequence  of  a  preassigned  stop  rule.  Let 
us  again  consider  only  the  simplest  case,  that  of  fixed  N. 

Purtner,  let  us  suppose  that  the  purpose  of  the  observer 
In  carrying  out  this  process  Is  to  maximize  the  expected  value 
of  some  function  of  his  final  state  vector,  the  state  attained 
after  N  stages  of  the  process. 

Let  fT  denote  this  maximum  expected  value,  and  f^  the  maxi¬ 
mum  expected  value  w:.en  the  transformation  T  Is  the  identity 
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transformatlon ,  tne  case  In  which  we  have  a  dl stort ionites 
communication  channel. 

Let  us  then  agree  to  measure  the  worth  of  tne  original 
communication  system  by  means  of  a  preassigned  function  of  f^ 
and  fj.  In  this  fashion  we  Introduce  a  metric  into  the  space 
of  transformations  T,  and  thus  into  the  set  of  communication 
channels.  The  simplest  cases  are  those  where  we  use  a  function 
of  fj  -  fT,  or  a  function  of  f^/f j . 

We  snail  discuss  a  simple  case  of  a  process  of  the  above 
type  in  later  sections.  For  the  formulation  and  mathematical 
discussion  of  some  particular  processes  of  this  general  type  we 
refer  to  [9J  and  [4J . 

4  .  A  Stocnast lc  Learning  Process . 

Let  us  now  consider  a  different  type  of  stochastic  process. 

Tfe  observer  is  required  to  make  a  decision  concerning  the  nature 
of  tne  pure  signal  emitted  by  S .  He  can  observe  as  many  samples 
of  the  signal  emitted  by  tne  communication  system  as  he  wishes, 
subject  to  constraints  imposed  by  tne  costs  of  observation,  and 
by  limitations  of  time. 

As  a  result  of  these  decisions,  he  makes  an  estimate  con¬ 
cerning  properties  of  tne  pure  signal,  and  thereby  incurs  a  cost 
dependent  upon  tne  deviation  of  tnis  estimate  from  tne  actual 
situation . 

The  problem  is  to  carry  out  the  process  of  first  observation 
and  then  estimation  so  as  to  minimize  the  expected  total  cost, 
where  the  total  cost  is  a  given  function  of  the  costs  of  observation 
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ana  tne  cost  of  Jevlatlon. 

The  theory  of  sequential  analysis  Is  devoted  to  one  aspect 
of  this  general  problem.  Other  aspects  arise  In  the  theory  of 
learning  processes,  cf.  [6]  and  [4]. 

It  Is  clear  that  we  can  define  tne  worth  of  the  communica¬ 
tion  cnannel  In  a  manner  completely  analogous  to  the  procedure 
discussed  above. 

5.  A  More  General  Frocess . 

It  Is  clear  that  both  processes  are  particular  cases  of  a 
more  general  process  where  neither  the  structure  of,  nor  the 
transformation  due  to,  the  communication  channel  Is  completely 
known.  Eacn  stage  of  the  process  yields  d  certain  return,  which 
may  be  negative,  In  resources,  and  yields  additional  information, 
which  may  also  be  negative,  concerning  the  intrinsic  structure 
of  the  combined  system. 

The  problem  l?  to  carry  out  the  sequence  of  declsione  so  as 
to  maximize  some  function,  which  may  not  necessarily  be  completely 
known,  of  tne  total  returns  and  the  information  pattern. 

It  i 8  interesting  to  observe  that  posed  in  this  way,  we 
encounter  one  of  the  basic  problems  of  experimental  research. 

6 .  Discussion . 

For  the  above  approach  to  be  fruitful,  and  to  represent  more 
technology  tr.an  tautology,  one  must  possess  mathematical  techniques 
capable  of  formulating  in  precise  terms,  and  treating,  processes 
of  the  kind  described  above. 
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Tne  tneory  of  sequential  analysis  developed  by  Wald,  Wolfo— 
witz,  Blackwell  and  Glrshlck  provides  an  approach  to  one  class 
of  problems  of  this  type,  while  a  general  approach  to  these  multi¬ 
stage  decision  processes  Is  provided  by  the  theory  of  dynamic 
programming  of  [l] ,  [2],  [}]  . 

A 8  an  application  of  these  general  methods,  we  shall  consider 
a  simple  Interesting  model  proposed  recently  by  Kelly,  [6],  and 
some  generalizations. 

7.  The  Model  of  Kelly. 

Let  us  begin  by  treating  the  first  problem  posed  by  Kelly. 

A  gambler  receives  advance  information  concerning  the  out¬ 
comes  of  a  sequence  of  independent  sporting  events  over  a  noisy  com¬ 
munication  channel.  We  assume  tnat  the  outcome  of  each  event  ic 
the  result  of  play  between  two  evenly  matched  team3,  and  that  p 
is  the  probability  of  a  correct  transmission,  and  q  =*  U-p)  the 
probability  of  an  Incorrect  transmission. 

Assuming  that  the  gambler  starts  with  an  initial  amount  x 
and  bets  on  the  outcome  of  each  event  so  as  to  maximize  his 
expected  capital  at  the  end  of  N  stages  of  play,  it  is  clear  that 
he  wagers  his  entire  fortune  on  each  play  If  p  >  1/2,  and  notnlng 
if  p  <  1/2.  If  p  ■  1/2,  it  makes  no  difference  what  policy  he 
employs.  (We  are  supposing  that  the  gambler  must  bet  on  the 
received  signal,  if  at  all.  It  is  easy  to  see  that  if  we  allow 
him  ccmplete  freedom  in  placing  bets,  then,  in  tne  case  where 
p  <  1/2,  nis  bet  will  always  be  contrary  to  the  information  ne 
receives . ) 
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A  much  more  difficult  process  arises  if  we  take  p  to  be  a 
fixed,  but  unknown  quantity  which  must  be  determined  on  the  basis 
of  the  observed  results  of  betting.  This  lead3  to  a  "learning 
process."  An  expository  treatment  containing  a  number  of  additional 
references  may  be  found  In  Robbins,  [9],  while  a  treatment  by 
dynamic  programming  of  a  similar  problem  may  be  found  in  [4] . 

If  1/2  <  p  <  1,  with  probability  one  the  gambler  will  go  broke 
following  such  a  policy. 

Let  us  now  assume  that  the  above  mode  of  play  appears  too 
Hazardous  to  the  gambler,  and  that  he  wishes  to  pursue  a  more 
conservative  policy,  one  that  will  prevent  him  from  ever  being 
wiped  out.  Re  may  then  proceed  to  maximize  the  expected  value 
of  the  logarithm  of  his  capital  at  the  end  of  N  stages  of  play, 
see  §  14. 

For  tne  one-stage  process,  he  is  faced  with  the  problem  of 
maximizing 

(1)  E1(y)  -  p  log (x+y )  4  q  log(x-y) 

over  all  y  In  [o,x].  Here  y  Is  the  amount  wagered,  fair  odds 
being  assumed.  It  Is  easy  to  see  that,  If  p  >  q,  we  have 

(2)  y  »  (p-<i)x, 
and  for  that  value  of  y 

(3)  E1  =*  log  x  4  log  2  4  p  log  p  4  q  log  q. 

If  p  <  q,  the  maximum  1b  at  y  »  o. 

It  is  not  difficult  to  show  that  If  we  consider  N— stage 
processes,  where  we  restrict  ourselves  to  wagering  policies 
whlcfi  require  the  wagered  amount  to  be  a  fixed  proportion  of  the 


total  capital  at  each  stage,  then  the  policy  described  above  is 
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optlnal.  This  result  was  established  by  Kelly  [6],  In  a  very 
Ingenious  fashion. 

Let  us  now  demonstrate  tnat  this  policy  is  optimal  within 
tne  class  of  all  wagering  policies. 

8 .  Economic  Forecasting . 

It  Is  clear  that  the  above  mathematical  model  is  abstractly 
identical  with  problems  that  arise  In  connection  with  economic 
forecasting,  In  particular,  and  with  forecasting,  in  general,  as 
for  example  weather  prediction. 

In  these  cases,  the  physical  world  is  the  source  and  tne 
scientific  corps,  both  experimental  and  tneoretical,  the  communi¬ 
cation  channel.  Sometimes,  the  tneorlst  or  experimenter  is  also 
the  observer;  at  other  timer,  it  is  the  business  man  or  politician 
who  must  decide  to  what  extent  he  trusts  nls  communication  channel. 

* 

9-  Dynamic  Programming  Approacn. 

Let  us  begin  by  formulating  the  problem  In  dynamic  program¬ 
ming  terms.  Define  the  following  sequence  of  functions, 

(1)  f^(x)  *  expected  value  of  the  logarithm  of  the  final 

capital  obtained  from  an  N— stage  process  start¬ 
ing  with  an  initial  capital  x  and  using  an 
optimal  policy. 

An  optima  1  policy  is  here  defined  as  one  which  maximizes  the 
expected  value  of  the  logarithm  of  the  capital  at  the  end  of  N 
stages.  Using  the  principle  of  optimality,  [2],  we  obtain  the 
recurrence  relations 

•The  results  contained  In  this  section  answer  the  fundamental 
question  posed  by  Kelly  on  p.  926  of  [6] . 


(2) 

where 

(3) 
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f  1  (x)  -  log  x  -»•  K, 


fN(x) 


Max 

o<y£x 


[  PfN-l^X+y^  +  (l-p)fN-l(x-y) 


n  ;>  2 


K 


log  2  +  p  log  p  +  q  log  q,  P  >  q, 


0, 


P  <  Q, 


Let  us  now  demonstrate  the 
Theorem :  For  N  >  1 ,  we_  have 
(**)  fN(x)  -  log  x  +  NK, 


where  K  Jj?  defined  as  above .  The  optimal  policy  is  unique ,  and 
Independent  of  N.  I_t  conglats  of  choosing 
(3)  (a)  y  -  )p-q)x,  p  >  q, 

(b)  y  -  0,  p  <  q. 


Proof.  Let  us  proceed  Inductively,  beginning  with  the  known 
result  for  N  ■  1.  Assuming  that  the  result  holds  for  N,  we  have. 


(6; 


fN.i(x)  =  Max  C  p[log(x+y)  +  NK]  +  (l-p)  Llog(x-y)+  N 
N  1  0<y<x  i 


»  Max  I  p  log(x-fy)  +  U-p)log(x-y)  +  NK. 

0<y<x  L  J 

(7)  fN+1(x)  *  (log  x  +  K)  +  NK  •  log  x  +  (N+l )K . 

The  statement  concerning  the  form  of  the  optimal  policy 
follows  from  tne  analytic  form  of  f^,(x). 

Now  tnat  the  '‘best"  performance  of  the  noisy  channel  has  been 
determined,  It  may  be  compared  in  various  possible  ways  with  the 
performance  of  a  perfect  channel. 


10.  Generalizations  I.  Time  Dependent  Case . 

Before  proceeding  to  more  general  cases,  let  ue  consider  a 
simple  extension  of  the  above  model. 


P-9  ^9 
Rtviaed 
12-1^-56 
-11- 

t  h 

To  begin  with,  let  us  suppose  that  at  the  k  stage,  the 
probability  of  correct  transmission  Is  p.  ,  and  of  incorrect 

j  K 

ijransmlsslon  q^  =»  1  —  p^.  For  fixed  N,  define  the  sequence  of 
/“unctions 

(1)  f  (x )  =*  expected  value  of  the  logarithm  of  the  final 

\\ 

capital  obtained  from  the  remaining  k  stages 
of  the  original  >J-stage  process,  starting  with 
an  initial  capital  x,  and  using  an  optimal  policy. 

Then 

(2)  f,(x)  -  Max  f  pN  log  [x+y )  +  q..  log(x-y)]  , 

0<y<x  t  N  N  J 

fk(x)  "  Q^Xx  f  PN-k+l  fk-l(x*y)  * 

As  before,  it  follows  inductively  t£at 

(3)  fu(x)  -  log  x  +  k  log  2  +  Z  f  p  log  p  +  q  log  q  1 , 

K  r-N-kfl  L  r  r  r  q 

provided  that  p,  >  1/2  for  k  »  1,2,..., N.  Wnenever  tills  condition 

Ki 

fails,  the  term  log  pk  +  qk  log  qk  must  be  replaced  by  (-  log  2). 

11.  Generalization*  II .  Correlation . 

Let  us  now  consider  the  case  where  the  signals  are  not 
independent.  The  simplest  case,  pernaps ,  is  that  where  the  pro¬ 
bability  of  correct  transmission  p^  depends  upon  whether  or  not 
tne  preceding  signal  was  transmitted  correctly.  Although  a  lar^e 
variety  of  questions  of  this  type  may  be  formulated,  we  feel 
that  the  following  discussion  will  illustrate  the  uniform  method 
which  may  be  employed  to  treat  them. 
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t”  h 

Pk  »  probability  of  correct  transmission  of  the  k~ 
signal,  If  the  (k-l)st  signal  was  transmitted 
c ^rrectly  . 

i.  L. 

■  probability  of  correct  transmission  of  the  k-— 
signal  if  the  (k-l)st  signal  was  transmitted 

*  1 

Incorrectly.  / 

Define  the  sequence  of  functions, 

(2)  lk(x)  *  expected  value  of  the  logarithm  of  the  final 
capital  obtained  from  the  remaining  k  stages 
of  the  original  N— stage  process,  starting  with 
an  Initial  capital  x,  and  the  information  that 
the  (k— l)st  signal  was  transmitted  correctly 
and  using  an  optical  policy. 
gk(x)  =  the  correspor ding  function  in  the  case  where 

the  (k— l)st  signal  was  transmitted  Incorrectly. 

Then,  as  above, 

(J)  fk<x)  -  «ax  [pjmo-i  fk-i'x+y)  +  <l-PN_k+1)sk-i<*-J')l 


Let 

(1) 


gk<x)  *  «axx  [  r  1-4 — k  + 1  fk-l(x+y)  +  <  1~rN-k+l N-l (*-y > 

It  f ol lows  inductively,  as  before,  that 
(4)  fk(x)  -  log  x  +  ak, 

gR(x)  -  log  x  *  bk, 


wnere  the  recurrence 


relatione  for  the  ak 


and  bk  art  readily 


er  tablished . 
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12.  Generalizations  HI.  M-Slgnal  Channels  . 

Let  us  now  consider  the  situation  In  which  the  channel  is 
called  upon  to  transmit  any  of  M  different  symbols.  Upon  receiv¬ 
ing  a  symbol  the  gambler  must  make  bets  on  what  he  believes  the 
transmitted  signal  to  have  been.  Assume  that  the  gambler  possesses 
the  following  information: 

*  the  conditional  probability  that  the  J— signal 
was  sent  if  the  i-signal  is  received, 
q^  ■  the  probability  of  receiving  the  i-slgnal. 

Tj  «  the  return  from  a  unit  winning  bet  on  signal  J. 

Finally,  let  us  assume  that  the  gambler  is  fret  to  bet  the  amount 
z1  >0  on  the  i—  signal,  subject  to  the  restriction  tnat 
M 

2  z.  <  x.  As  before,  the  gambler  proceeds  so 
1-1  1  “ 

as  to  maximize  the  expected  value  of  the  logarithm  of  his  capital 
after  N  stages. 

Defining  the  sequence  ^fN(x)jas  above  we  obtain  the  relations 

mm  m 

fN(x)  -  2  q.  Max  2  p  .  fN  ,(r.z  +X-Z  z  ),N£2, 

N  1-1  1  ZZ^X  J-l  U  J  J  8-1  8 

zl>° 

and 

M  M  M 

f  1  (x)  0  2  g1  Max  2  p  log  fr  ,z  +  x-  2  z  ) . 

1  i-1  1  2zx<x  j-l  lJ  J  J  8-1  8 

zl>° 

In  tnls  case  we  prove  inductively  that 
fN(x)  -  log  x  +  NK, 

MM  M 

K  -  2  q.  Max  2  p.  .  log  (r.z.  +  1-  2  z  ). 

i-1  1  Zzt<  1  j-l  J  J  x-1  8 

z1^0 


wnere 
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Prom  the  expression  for  K  It  Is  clear  that  the  optimal 

policy  depends  only  on  p^j  and  r^  and  not  on  the  q^  ,  though  the 

return  itself  does  also  depend  on  q, .  An  Interesting  special 

1  M 

case  is  that  in  which  It  Is  required  that  2  z.  «  x;  that  is, 

1-1  1 

th*?  gambler  Is  required  to  wager  all  of  his  available  funds.  In 
this  case  the  optimal  policy  depends  only  on  (p^),  i.e.,  on  the 
communication  channel,  and  not  at  all  on  q^  or  on  .  If  we  now 
introduce 


1  ■ 
Pi 


1J 


->  J 


Pi  - 


'1J 


we  then  have 


probability  of  sending  an  i. 

«  the  conditional  probability  that  if  an  i  is  sent, 
then  j  is  received, 


M 

-  2 

j-i 


qj  pji 


M 

2  p  ,  t 

J-l 


j  n  ' 


and 


is  the  inverse  of 


( 


Consequently,  in  this  case  the  optimal  policy  is  dependent  only 
on  (tjj),  which  characterizes  the  communication  channel  and  is 
Independent  of  both  the  source  characterized  by  p  and  the  outside 
world,  characterized  by  the  odds,  r^ .  The  return ,  however,  does 
depend  on  all  ttysse  quantities. 

These  considerations  are  significant,  for  they  imply  that 
the  gambler’s  actions  are  controlled  solely  by  the  quality  of 
the  communication  channel,  though  his  ultimate  return  le  determined 
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by  tne  situation  Iji  toto.  This  leads  to  the  possibility  of 
comparing  two  channels  under  the  same  conditions  or  of  evaluating 
the  performance  of  a  given  channel  under  various  conditions. 

Specializations  to  the  unsymmetrlc  binary  channel  are 
immediate . 


13.  Generalizations  -  IV.  Continuum  of  Signals . 

Consider  now  the  case  where  there  is  a  continuum  of  different 
signals.  Let 

(1)  dG(u,v)  -  the  conditional  probability  that  a  signal 

with  label  between  v  and  v  +  dv  Is  sent  if 
the  u  -  signal  is  received,  — ao  <  u,  v  <  a)  , 

and  let 

(2)  dH(u)  -  the  probability  that  a  signal  with  label  between 

u  and  u  +  du  is  received  at  any  stage. 

Then,  considering  the  process  corresponding  to  the  special 
case  discussed  above,  even  bets  fcelng  assumed  for  the  sake  of 
simplicity,  we  derive  the  recurrence  relations 


0) 


fN(x) 


fX(x) 


I  ***  f 

L  z  (v )  —00 

r  od 

I  Max  P 
L  Z  ( V )  — 


fN-l  ‘2z  vv))d0!u,v) 


dH  |u). 


log  (2z(v)  )dG(u,  v)  1  dil(u). 


In  botn  cases,  the  maximization  is  over  all  functions  z  |v) 
satisfying  the  conditions 

m 

(4)  (a)  z ( v )  >  C 

OD 

(b)  P  z(v)d v  -  x. 

—co 

As  above,  it  Is  easily  seen  inductively  that 


wnere 


and 


(5) 

fN(x)  -  log  2x  + 

KN, 

00 

— 

CO 

(6) 

K  »  y*  Max 

—00  Z  (  V  ) 

S'  log 

—CD 

(7) 

(a)  z(v)  >  0, 

OD 


(b)  P  z(v)dv 

-CD 


»  1  . 
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dH(u), 


14.  Criterion  Functions  Yielding  Invariant  Policies . 

We  have  seen  above  that  the  linear  function  yields  an 
Invariant  policy  at  each  stage,  and  likewise  the  logarithm.  It 
Is  of  interest  to  determine  all  criterion  functions  posseaslng 
this  property. 

The  following  version  of  the  problem  will  be  treated  here. 
Let  <$  (x )  be  a  monotone  Increasing  concave  function  defined 
over  0  <  x  <  1.  Consider  the  one— stage  process  where  we  wish 
to  maximize 

(1)  E  (y  )  *  p$(x+y  )  +  (l-p)<X*-y). 

The  function  E(y)  Is  concave  as  a  function  of  y  for  0  £  y  £  x, 

0  <  x  1,  and  thus  has  a  unique  maximum,  unless  <^\x)  is  linear 
and  p  =*  1/2 .  Let  us  dismiss  tne  case  of  linearity  by  requiring 
strict  concavity,  q  "  (x)  <  0,  and  take  p  >  1/2. 

Let  us  assume  that,  for  all  x  In  0  <  x  <  1,  there  is  a  solu¬ 
tion  of 

(?)  —  -  p  4 *  (x+y )  -  (1-p)  4’(x-y)  -  0 

dy 
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having  the  form 

(3)  y  -  r(p)x, 

where  r(p)  Is  a  nonnegative  differentiable  function  of  p  for  l/2<p<l, 
possessing  a  continuous  derivative. 

Then  (2)  Is  equivalent  to  the  functional  equation 

(4)  pq  '  (x(l+r(p) ) )  -  (l-p)q' (x(l-r(p))), 

for  0  <  x  <  1,  1/2  <  p  £  1. 

Let  x(l+r(p))  -  y.  Then  (4)  reduces  to 


We  now  differentiate  first  witn  respect  to  y,  and  then  with 
respect  to  p,  obtaining  the  two  equations 


(6) 


Dividing  tne  two  equations,  we  obtain 
<’>  *£$  ■  ■ 


where  u(p)  -  (l-r(p) )/(l+r(p) ) . 

Since  the  left  side  Is  a  function  of  y  and  the  right  side  a 
function  of  p,  botn  sides  must  be  constant.  Setting 

(0)  tyi  -  K,  K  <  0, 

**(y> 

we  obtain 


(9)  log  ^'(y)  ■  K  log  y  +  cr 


Hence 

(10)  4' (y)  -  c2yK. 


P-9^9 

Revised 

12-19-56 

-lfc- 


Wlthout  loss  of  generality,  let  us  normalize,  so  that  4'(l)  »  1. 
Then 

(11)  <r(y)  -  yK . 


If  K  >  -  1 ,  we  have 
VK+1 

( 12 )  9 (y )  “  +  ci 1  • 

If  K  -  —1,  we  have 

(13)  4(y)  -  log  y  +  c '  . 


It  1 b  clear  that  K  >  — 1  ie  necessary  for  <Ky)  t0  b*  non- 
negative  for  y  >  0.  Finally,  without  loss  of  generality,  we  can 
let  c  '  «  c^  '  -  0 . 

15*  Discussion. 

In  the  foregoing  pages,  we  have  essayed  to  describe  some 
applications  of  the  concepts  and  techniques  of  the  theory  of 
dynamic  programming  t  various  aspects  of  communication  theory, 
simple  illustrations  we  have  considered  a  particular  process 


As 


discussed  by  J.  Kelly  and  various  generalizations.  In  subsequent 
papers,  we  propose  to  treat  in  greater  detail  some  mathematical 
models  of  greater  scope. 
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